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Abstract 

We consider the efficient minimization of a nonlinear, strictly convex functional 
with £i-penalty term. Such minimization problems appear in a wide range of applica¬ 
tions like Tikhonov regularization of (non)linear inverse problems with sparsity con¬ 
straints. In (2015 Inverse Problems 31 025005), a globalized Bouligand-semismooth 
Newton method was presented for fi-Tikhonov regularization of linear inverse prob¬ 
lems. Nevertheless, a technical assumption on the accumulation point of the sequence 
of iterates was necessary to prove global convergence. Here, we generalize this method 
to general nonlinear problems and present a modified semismooth Newton method for 
which global convergence is proven without any additional requirements. Moreover, 
under a technical assumption, full Newton steps are eventually accepted and locally 
quadratic convergence is achieved. Numerical examples from image deblurring and 
robust regression demonstrate the performance of the method. 

Keywords: global convergence, semismooth Newton method, £i-Tikhonov regular¬ 
ization, inverse problems, sparsity constraints, quadratic convergence 

Mathematics Subject Classification: 49M15, 49N45, 90C56 


1 Introduction 

We are concerned with the efficient minimization of 

OO 

min g{u)+ '^Wk\uk\, (1) 

where : £2 M is a twice Lipschitz-continuously differentiable and strictly convex func¬ 
tional, £2 = ^ 2 (H) and w = {'Wk)k is a positive weight sequence with Wk > wq > 0. 
Minimization problems of the form (1) appear in various applications from engineering 
and natural sciences. A well-known example is Tikhonov regularization for inverse prob¬ 
lems with sparsity constraints, e.g. medical imaging, geophysics, nondestructive testing or 
compressed sensing, see e.g. [16,20,26,55]. Here, one aims to solve a possibly nonlinear 
ill-posed operator equation Ar(u) = K: £2 ^ h- In practice, one has to reconstruct 
u G ^2 from noisy measurement data f*^ ~ f. In the presence of perturbed data, reg¬ 
ularization strategies are required for the stable computation of a numerical solution to 

‘Johannes Gntenberg University Mainz, Institnte of mathematics, Staudingerweg 9, D-55099 Mainz, 
Germany. E-Mail: hanse@uni-mainz.de, raasch@uni-mainz.de 


1 



2 


E. Hans, T. Raasch 


an inverse problem [16,49]. Applying Tikhonov regularization with sparsity constraints, 
one minimizes a functional consisting of a suitable discrepancy term g: £2 —?• IR and a 
sparsity promoting penalty term, see e.g. [13] and the references therein. Sparsity here 
means the a priori assumption that the unknown solution is sparse, i.e. u has only few 
nonzero entries. As an example, in the special case of a linear discrete ill-posed operator 
equation Ku = f, K: £2 ^ £2 linear, bounded and injective, f G £ 2 , one may choose the 
discrepancy term 51 (u) := ^jjATu —fjj^^ [16]. For nonlinear inverse problems like parameter 
identification problems, convex discrepancy terms from energy functional approaches may 
be considered, see e.g. [31,35,38,40]. Sparsity of the Tikhonov minimizer with respect to 
a given basis can be enforced by using the penalty term in ( 1 ), where the weights Wk act 
as regularization parameters, see e.g. [25,26,34] and the references therein. 

In current research, sparsity-promoting regularization techniques are widely used, see 
e.g. [6,13,24,26,33,34,38-40] and the references therein. Such recovery schemes usually 
outperform classical Tikhonov regularization with £2 coefficient penalties in terms of re¬ 
construction quality if the unknown solution is sparse w.r.t. some basis. This is the case 
in many parameter identification problems for partial differential equations with piecewise 
smooth solutions, like electrical impedance tomography [24,33] or inverse heat conduction 
scenarios [7]. 

There exists a variety of approaches for the numerical minimization of (1) in the 
literature. In the special case of a quadratic functional g, iterative soft-thresholding [13] 
as well as related approaches for general functionals g are well-studied, see e.g. [6,8,48]. 
Accelerated methods and gradient-based methods introduced in [4,20,38,41,54] often gain 
from clever stepsize choices. Homotopy-type solvers [42] and alternating direction methods 
of multipliers [55] besides many others are also state-of-the-art. 

Other popular approaches for the solution of (1) are semismooth Newton methods 
[9,52]. A semismooth Newton method and a quasi-Newton method for the minimization 
of (1) were proposed by Muoi et al. in the inhnite-dimensional setting [40], inspired by 
previous work of Herzog and Lorenz [26]. If g is convex and smooth, it was shown e.g. 
in [11,26,39], that u* G .^2 is a minimizer of (1) if and only if u* is a solution to the 
zero-finding problem F; .^2 ^ 2 ; 

F(u) := u - S.yw(u - 7 V 5 r(u)) = 0 (2) 

for any hxed 7 > 0, where S^(u) := (sgn(«fc)(jwfcj — I3k)+)k denotes the componentwise 
soft thresholding of u with respect to a positive weight sequence /3 = {(3k)k, ^g denotes 
the gradient of g and = max{rE, 0}. In [40], F from (2) was shown to be Newton differ¬ 
entiable, i.e. under a suitable assumption on g there exists a family of slanting functions 
G: £ 2 —^ L{£ 2 ,£ 2 ) with 


lim 

h^O 


F(u-Fh) 


F(u)-G(u + h)hll,, 

llh]].. 


( 3 ) 


see also [9, 26, 52] for the definition of Newton derivatives. A local semismooth Newton 
method was defined in [40] by 

= -F(u(j')), (4) 

yO'+i) _ j = 0,1,.... (5) 


with a specially chosen G, cf. [9,52]. In [40], locally superlinear convergence was proven 
under suitable assumptions on the functional g. 
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Nevertheless, the above mentioned semismooth Newton methods are only locally con¬ 
vergent in general. In [39], a semismooth Newton method with filter globalization was 
presented where semismooth Newton steps are combined with damped shrinkage steps. 
Another globalized semismooth Newton method was developed in [28]. In loc. cit., inspired 
by [27,32,43,45], the method from [26] was globalized in a finite-dimensional setting for 
the special case of a quadratic discrepancy term 

1 ” 

min -||Ku - f|| 2 ( 6 ) 

k=l 


where K G ig injective and f G MN. In [28], F was shown to be Lipschitz continuous 

and directionally differentiable, i.e. Bouligand differentiable [17,43,50]. For such nonlin¬ 
earities a B(ouhgand)-Newton method can be defined [43], replacing (4) by the generalized 
Newton equation 

F'(u(^'),d(^')) =-F(u(^')). (7) 

In [28], the system (7) was shown to be equivalent to a uniquely solvable mixed linear 
complementarity problem [12]. By the choice (7), automatically is a descent direction 
with respect to the merit functional 0 ; —)• M, 

0(u):=||F(u)||i, (8) 


cf. [43]. Additionally, this Bouligand-Newton method can be interpreted as a semismooth 
Newton method with a specially chosen slanting function and is therefore called a B- 
semismooth Newton method, cf. [46]. By introducing suitable damping parameters, the 
method can be shown to be globally convergent under a technical assumption on the in 
practice unknown accumulation point u* of the sequence of iterates, see also [27,32,43,45]. 
Indeed, if the chosen Armijo stepsizes tend to zero, the merit functional 0 has to fulfill 
the condition 


lim 

(u,v)^(u*,u*) 


0 (u)- 0 (v)- 0 '(u*,u-v) 


= 0 


u- V 2 


(9) 


at u* to ensure global convergence. 

In this work, we present a modihed, globally convergent semismooth Newton method 
for the minimization problem ( 1 ) in the hnite-dimensional setting 


n 

min g{u) + Wk\uk\ ( 10 ) 

uSM" 

fc=l 


for general (not necessarily quadratic) strictly convex functionals g: M”" —)• M. Our work 
is inspired by Pang [44], where a globally and locally quadratically convergent modihed 
Bouligand-Newton method was presented for the solution of variational inequalities, non¬ 
linear complementarity problems and nonlinear programs. We take advantage of similar¬ 
ities of nonlinear complementarity problems and the zero-hnding problem ( 2 ) to propose 
a modihed method similar to [44]. Starting out from [28,40], we develop a globalized B- 
semismooth Newton method for general possibly nonquadratic discrepancy functionals g. 
In order to achieve global convergence without any requirements on the a priori unknown 
accumulation point of the iterates, inspired by [44], we propose a special modihcation of 
the Newton directions from (7), retaining the descent property w.r.t. 0. The result¬ 
ing generalized Newton equation is again shown to be equivalent to a uniquely solvable 
mixed linear complementarity problem. Fortunately, in our proposed scheme, under a 
technical assumption, full Newton steps are accepted in the vicinity of the zero of F. As a 
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consequence, under an additional regularity assumption, locally quadratic convergence is 
achieved. Additionally, the resulting modified method can be interpreted as a generalized 
Newton method proposed by Han, Pang and Rangaraj [27]. In a neighborhood of the zero 
of F, the modified method, under a technical assumption, coincides with the B-semismooth 
Newton method from [28] reformulated for nonquadratic g. If 51 is a quadratic functional, 
it was shown in [28] that in a neighborhood of the zero, the B-semismooth Newton method 
finds the exact zero of F within finitely many iterations. 

Alternatively, one may consider other globalization strategies as trust region methods 
or path-search methods instead of the considered line-search damping strategy, see e.g. 
[18,52] and the references therein. The path-search globalization strategy proposed by the 
authors of [14,47] could be a promising, albeit conceptually different, alternative. These 
approaches go beyond the scope of this paper and are part of future work. 

For the rest of the paper, we require the following assumption on the smoothness of 
g, similar to [40, Assumption 3.1, Example 3.4]. In Section 3, we will need a further 
assumption regarding the locally quadratic convergence of the method. 

Assumption 1. (Al) The function g is twice Lips chit z-continuously differentiable and 
the Hessian V^(/(u) is positive definite for all u G M"". Moreover, there exist con¬ 
stants 0 < Cl, C2 < 00 with 

cillhjjl < (V^5r(u)h,h) < C2llhjj2, for all h G 

uniformly for all u G M”. 

(A2) The level sets := {u G M"" : 0(u) < 0(u(°))} of Q are compact. 

The compactness of the level sets in the case of a quadratic functional <7(0) = ^jjKu — 
fjj 2, K G injective, n < m, f G was shown in [28]. Note that the positive 

definiteness of the Hessian V^(/(u) implies strict convexity of the functional g and ensures 
unique solvability of (10). 

The paper is organized as follows. Section 2 treats the proposed B-semismooth New¬ 
ton method and its modification as well as their feasibility. Section 3 addresses the global 
convergence and the local convergence speed of the methods. Numerical examples demon¬ 
strate the performance of the proposed algorithms in Section 4. 

2 A B-semismooth Newton method and its modification 

In this section, we present the algorithm of the B-semismooth Newton method from [28] 
generalized to the minimization problem (10) as well as a modihed version and discuss 
their feasibility. Additionally, we suggest a hybrid method. We start with the modified 
algorithm because the generalized B-semismooth Newton method can immediately be 
deduced from the modified method. 

2.1 A modified B-semismooth Newton method and its feasibility 

In the following, we introduce a modihed B-semismooth Newton method for the solution 
of (10). We denote the active set by A(u) := A^(u) U A^(u), where 


A+(u) := {k : 7 (V 5 r(u))fc 'ywk < Ufc} 
A~{u) := {k :uk < 'y{Vg{u))k - -fWk} 


( 11 ) 

( 12 ) 
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and the inactive set by X(u) := X°(u) UX"*"(u) UX (u), where 

X°(u) ;= {k : y{Vg{u))k - ywk < Uk < y{Vg{u))k + ywk}, (13) 

X+(u) ;= {k :uk = y{Vg{u))k + ywk}, (14) 

X-(u) ;= {k:uk = 7(Vg(u))k - jWk}. (15) 

Below, we drop the argument u if there is no risk of confusion. 

For F: —)■ M” defined by (2), we then have 

Fk(u) = mm{j(Vg(u))k+ 7 Wk,Uk}, /c G A+(u) UX°(u) UX+(u), (16) 

Xfc(u) = max{ 7 (Vc/(u))fc - 7 u;fc,Ufc}, /c G A~(u) UX°(u) UX"(u). (17) 


By Assumption 1, F is Lipschitz continuous and directionally differentiable. The direc¬ 
tional derivative of F can be easily deduced. 


Lemma 2. The directional derivative of Y at u G M" in the direction d G is given 
elementwise by 


^fc(u,d) 


7 (V^ 5 '(u)d)fc, k G A(u), 

dk, /cGX°(u), 

min{ 7 (V 25 ((u)d)fc, 4}, k G X+(u), 

max{7(V^5r(u)d)fc,4}, k G X"(u). 


(18) 


Proof. The claim is trivially true for k G A(u) and k G X°(u). For k G X'*'(u) we have 
with (14) and (16) 

min{ 7 (Vg(u + td))k + ywfc, Uk + tdk} - min{ 7 (Vff(u))fc -G ywk, Uk} 
t\o t 

_ min{ 7 (Vg(u + td))k - 'y{Vg{u))k, tdk} 
t\o t 

. fv 7(Vg(u-Ftd))fc-7(Vg(u))fc 'I 

U\o t / 

= min{ 7 (V^g(u)d)fc, 4}. 


The claim for k £ I (u) results analogously. 


□ 


The directional derivative of the merit functional 0 from (8) at u G MT in the direction 
d G M” is given by 0'(u, d) = 2(F'(u, d), F(u)), where (•, •) denotes the Euclidean scalar 
product, see e.g. [28, Lemma 3.2]. 

To introduce the modified semismooth Newton method, we define the subsets 


A:^(u) := {k : 7 (Vg(u))k + ywk < Uk < 0}, (19) 

Al(u) := {k:0 <Uk < y{Vg{u))k - ywk}, (20) 

X+(u) := {k : j(Vg(u))k - ywk < Uk < y{Vg{u))k + ywk < 0}, ( 21 ) 

X° (u) := {/c : 0 < 7 (Vg(u))fc - ywk < Uk < 7 (Vg(u))fc + ywk}- (22) 

Inspired by [44], we define the modified index sets 

A+(u) := A'''(u) \ A|(u), (23) 

A“(u) := A“(u) \ Al(u), (24) 

X^(u) := X°(u) \ (X;(u) UX° (u)), (25) 

X+(u) ;=X+(u)uA+(u)UX°(u), (26) 

X^(u) ;=X-(u)uAl(u)UX°(u). (27) 
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We denote ^(u) := ^+(u) U^”(u) and X(u) := X°(u) UX+(u) UX“(u) respectively. The 
subsets (19)-(22) fulfill ^+(u) = 0, ^l(u) = 0, X^(u) = 0 and X° (u) = 0 if F(u) = 0. 

In the following lemma, we consider a linear complementarity problem which is impor¬ 
tant for all further discussions, cf. [28]. 

Lemma 3. Let u G and M := V^ 5 '(u). The linear eomplementarity problem 


with 


(28) 


N =N(u) 


^2:+,X+ '^X+ ^X+,I- 


I . - A,A A,X+ X+,A A,A 


^X- ,A^'^A,A 


^x-,x+ 


‘-x-,x- 




X- A^'^AA A,x 


and 


z = z(u) 

_ f -|- F(u)^ 




+ M-. 


MT^-^F(u);^ - F(u 


X-,A A,A 


(29) 


(30) 


has a unique solution. 


Proof. By Assumption 1, M = V^ 5 '(u) is symmetric and positive definite. Therefore, N 
from (29) is symmetric and positive definite, see [28, Lemma 3.3]. Hence (28) is uniquely 
solvable, see [12, Theorem 3.3.7] and [28, Theorem 3.5]. □ 


Now we can define the generalized Newton equation for F, cf. [28]. Let u G ML and 

B = H(u) := ^(u) U {/c G X+(u) UX~(u) : Xk > 0}, 

C = C(u) := X°(u) U {/c G X+(u) UX^(u) : Xk = 0}, 


where x = {xk)k is the unique solution to the linear complementarity problem (28). Then, 
by defining the generalized derivative blockwise 


(G{\i)b,B G(u)h,c^ /^7(V^5'(u)) 6,B 7(V25r(u))6,c^ ^ ^nxn 

VG(u)c,b G(u)c,cy V Ic,c ) 

the modified semismooth Newton method is given by 

G(u(^'))d(^') = -F(u(^')), 

y(i+i) _ y(i) _|_ i-^H) ^ j = 0,1,... 

with suitably chosen damping parameters tj G (0,1]. 

Remark 4. In [40], Muoi et al. chose the slanting function 

/G(u)^,^ G(u)^,x\ /7(V25r(u))^,^ 7(V^(/(u))^7\ 

\G(u)j^^ G(u)x^iy Ix,x / 


(32) 

(33) 

(34) 

(35) 


blocked according to the active and inactive sets, to define the local semismooth Newton 
method (4),(5). The key difference of (32) compared to (35) is the modification of the 
index sets. Note that G from (32) is not a slanting function in general because in regions 
where F is smooth, G does not coincide with the Frechet-derivative of F. 
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Let G and M := Then G solves (33) if and only if 

7 ( Md (^’));4 = - F ( u (^'))^, 


U 

d— = —U— 


and 


X := 


X+ 1+ 

_dW 


:= N(uW)x + z(uO')) = 


(MdO'))^+(F(uW))^ 


(36) 

(37) 


(38) 


,^-^-(F(uO')))_ 

where x, y solve the linear complementarity problem (28), cf. [28, Lemma 3.4]. 

We summarize the above observations in the following lemma, cf. [28, Theorem 3.5]. 

Lemma 5. Let x be the unique solution to (28) for an iterate G M"" and M := 
Then, the Newton update d*^^) from (33) is given by 


d— = —U— 
x° x° ’ 

d^ = x^ - u^, 
x+ 2 :+ x+ ’ 

di^ = -x^ - u^, 


(39) 


d^^ = -M 
A ry 




-7M:,,jd“-F(uO>7). 


Before proceeding, we prove some useful identities similar to [44, Lemma 2]. 

Lemma 6. Let u G M”, d = d(u) the unique solution to (39) and M = V^(/(u). For 
/c G {1,..., n}, we have the following identities 


{y{Ng{\\))kNyWk)l{M.d)k =-{y{Vg{M))k + yWkf, A: G A+(u), (40) 

(7(V5r(u))fc - 7r(;fc)7(Md)fc =-(7(V5((u))fc - yrcfc)^, A:GA-(u), (41) 

Ukdk = -ul, k G 21° (u). (42) 

Additionally, for k G A!jl(u) UX^(u) U {/c G X+(u) : Ffc(u) < 0)} the inequality 

{y{Wg{\i))k + 7 tCfc) 7 (Md)fc < -( 7 (V 5 r(u))fc + ywkf (43) 

holds, for k G Al(u) UXL(u) U {A G X^(u) ; Xfc(u) >0)} we have 

{y{Vg{\i))k - 7 rcfc) 7 (Md)fc < -( 7 (V 5 r(u))fc - ywkf (44) 

and for k G Al|l(u) U Al(u) U X^(u) U X° (u) U {/c G X+(u) : Xfc(u) < 0)} U {/c G X~(u) : 
X]fc(u) > 0)} we have 


'^kdk ^ Uk‘ 


(45) 


Proof. Equations (40), (41) and (42) immediately follow from (36) and (37). For k G 
Al|l(u)UX^(u)U{A: G X+(u) : Xfc(u) < 0)}, we have by definition j{Vg{u))k+ywk < 0 and 
with (28) and (38) we have 7 (Md)fc > —Ffc(u) > —{'y{Vg{u))k + 'ywk) implying (43). For 
k G Al(u)UX° (u)U{A: G X~(u) : X).(u) > 0)}, we have by definition j{Vg{u))k — '~fWk > 0 
and with (28) and (38) we have 7 (Md)fc < —Ffc(u) < —{'y{Vg{u))k — 'ywk), implying (44). 

For k G A^(u) U X^(u) U {A: G X+(u) : Ffc(u) < 0}, we have Uk < 0 and dk > —Uk 
because of (28) and (38). If A: G Al(u)UX° (u)U{A G X~(u) ; Fk{u) > 0}, we have > 0 
and dk < —Uk because of (28) and (38). In both cases (45) follows. □ 
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Now we verify that d = d(u) from (39) is a descent direction of the merit functional 
0 from (8) at u. 

Lemma 7. Let u G M"" with 0(u) > 0 and M := V^ 5 ((u). Let d = d(u) G be the 
solution to (39). Then, we have 

0^(u, d) < —20(u) < 0, (46) 

i.e. d is a true descent direction of Q at u in the direetion d. 

Proof. The proof follows the idea of [44, Proof of Proposition 5]. We have 


0'(u,d) = 2(F(u),F'(u,d)) = 2^r„ 


i=l 


where we have with Lemma 2 


Ti := ^ Ffc(u)7(Md)fc, T 2 := ^ Ukdk, 

fc£.4(u) k£X°{u) 

Ts := ^ Ffc(u)min{ 4 , 7 (Md)fc}, T 4 := ^ ^.(u) max{4, 7 (Md)fc}, 

fcSl+(u) k&X~{u) 

n:= Fk{uh{Md)k, n:= FkHj{Md)k, 

fce.44u) fce.4.i{u) 

Tj := ^ Ukdk, Ts := ^ Ukdk- 

fcei^(u) fcei°(u) 

For k G X’'^(u), we have 

min{4,7(Md)fc} = min{4 + 4.(u), 7 (Md)fc + 4(u)} - Ffc(u) = -^.(u) 
because of (28) and (38). Similarly, for k G X”(u) we have 

max{ 4 , 7 (Md)fc} = max{4 + 4(u), 7 (Md)fc + Ffc(u)} - 4(u) = -Ffc(u). 
With (40)~(45), we obtain 


0'(u,d) = 2V4<-2V4(u 


i=l 


k=l 


-20(u), 


finishing the proof. □ 

We choose the stepsizes tj G (0,1] in (34) by the well-known Armijo rule 

tj := max{/3^ : 0(u^-^^ -|- < (1 — 2(T/3^)0 (u*^-^^), / = 0,1,...}, 

where (d G (0,1) and a G (0, ^), see also [27,28,32,43-45]. These stepsizes can be computed 
in finitely many iterations. We cite the following lemma from [28, Proposition 4.1]. 

Lemma 8. Let ft G (0,1), a G (0, ^). Let G ML with 0(u('^)) > 0 and let d^-^) = d(u('^)) 
be eomputed by (39). Then, there exists a finite index I G N with 

0(u(^') + Id^d^^^) < (1 - 2(7/3')0(u(^')). 


(47) 
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Algorithm 1 The B-semismooth Newton methods BSSN, modBSSN and hybridBSSN 

Choose a starting vector £ R", parameters /? £ (0,1), c £ (0, |) and a tolerance tol > 0 and set 
j 0. In case of hybridBSSN, additionally choose jmax £ N and tmin > 0. 
if BSSN is used or hybridSSN is used then 

Replace the modified index sets (23)-(27) by the index sets (11)-(15) in (28)-(30) and (39). 

end if 

while ||F(u ^'’^)||2 > tol do 

Compute the Newton direction from (39). 

tj 1 

while ©(u^'’^ + > (1 — 2atj)Q{u^^'^) do 

tj —tjp 

end while 

y(j+l) ._ y(i) _|_ ^ .jjO) 

J := i + 1 

if hybridSSN is used and j > jmax and tj < tmin then 

Use the modified index sets (23)-(27) in (28)-(30) and (39) for all following iterations. 

end if 
end while 


Proof. According to Lemma 7, it holds < —20(u(-^)) < 0. The remainder of 

the proof follows [28, Proof of Proposition 4.1]. □ 

The algorithm of the modified B-semismooth Newton method, in the following denoted 
by modBSSN, is stated in Algorithm 1. The feasibility of Algorithm modBSSN is guaranteed 
because of the lemmata stated above. 

Remark 9. Pang [44] introduced a modified B-Newton method for a nonlinear complemen¬ 
tarity problem. Han, Pang and Rangaraj [27] interpreted this iteration as a generalized 
Newton method 

= 0 , -|- tjd^^\ j = 0,1,..., 

where G: M” x M" —)■ M" fulfills the assumption that G(u, •) is surjective for each fixed 
u G M"", and 

2<F(u),G(u,d))>0'(u,d) 

for all u, d G M”, see [27, Section 2.3]. In the very same way, our Algorithm modBSSN can 
be interpreted as a generalized Newton method with G(u(-^), = G(u and G 

from (32), cf. Lemma 7. 

2.2 A B-semismooth Newton method and its feasibility 

The generalized formulation of the B-semismooth Newton method (5), (7) from [28] for the 
setting (10), in the following denoted by BSSN, is identical to Algorithm modBSSN replacing 
the modified index sets (23)-(27) by the original index sets (11)-(15) in (28)-(30) and (39), 
cf. Algorithm 1 and [28]. Analogously to the proofs in Section 2.1, the Newton directions 
d(i) can be shown to be uniquely determined and the Armijo stepsizes are well-defined 
because the Newton directions are descent directions w.r.t. the merit functional 0. Thus, 
the feasibility of the Algorithm BSSN is guaranteed. 

Remark 10. The modification of the index sets in Algorithm modBSSN is needed to prove 
global convergence without any additional requirements, see Section 3. Let u* be the 
unique zero of F and let X'’'(u*) UX~(u*) = 0, i.e. F is smooth at u*. Then, there 
exists a neighborhood U of u* where the index subsets (19)-(22) are empty for all u G (7, 
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i.e. the modified index sets (23)-(27) match the original index sets (11)~(15). Therefore, 
Algorithm modBSSN locally coincides with BSSN in a neighborhood of the zero u* of F if 
X'’'(u*) UX“(u*) = 0 and hence is a semismooth Newton method there. 

2.3 A globally convergent hybrid method 

The B-semismooth Newton method (Algorithm BSSN) from Section 2.2 is efficient in prac¬ 
tice because the index sets X=*=(u(-?)) in step j are usually empty so that the generalized 
Newton equation simplifies to a system of linear equations of the size |A(u('^))|. The size 
of the system of linear equations usually decreases in the course of the iteration. Nev¬ 
ertheless, the method may fail to converge, see Remark 10 and Theorem 15. However, 
the global convergence of Algorithm modBSSN from Section 2.1 is ensured by Theorem 12 
but here a mixed linear complementarity problem has to be solved in each iteration, see 
(39). Additionally, in order to set up the matrix N and the vector z from (29) and (30), 
|X(u^-^^)| -|- 1 systems of linear equations of the size |A(u('^)| with the same matrix have 
to be solved if X=*=(u('^)) / 0. Note that in (36) resp. (39) no additional system of linear 
equations has to be solved for the computation of d^\ Nevertheless, Algorithm modBSSN 
is usually less efficient than Algorithm BSSN. 

We suggest a hybrid method by starting with Algorithm BSSN and switching to Algo¬ 
rithm modBSSN when Algorithm BSSN begins to stagnate, by replacing the modified index 
sets (23)-(27) by the index sets (11)-(15) in (28)--(30) and (39). In our numerical exper¬ 
iments, we switch to Algorithm modBSSN if the number of Newton steps exceeds a limit 
jmax £ H and if the chosen stepsize is smaller than a threshold tmin > 0, i.e. if j > jmax 
and tj < tmin- In the sequel, this hybrid method is called hybridBSSN. An overview of 
the proposed methods is given in Algorithm 1. Similar hybrid methods, combining the 
fast local convergence properties of a local semismooth Newton method with the globally 
convergent generalized Newton method from [27] were proposed by Qi [45] and Ito and 
Kunisch [32]. 

3 Global convergence and local convergence speed 

In this section, we consider the convergence properties of the algorithms from Section 2. 

3.1 Convergence of the modified B-semismooth Newton method 

In the following, we address the global convergence of Algorithm modBSSN and its con¬ 
vergence speed in a neighborhood of the zero of F. Concerning the boundedness of the 
sequence of Newton directions we cite [28, Proposition 4.6]. 

Lemma 11. Let u G M” and d = d(u) be the solution to (39). Then, there exists a 
constant C = C{n) > 0 independent of u, with 

||d||2 <C||F(u)||2. (48) 

Proof. The proof follows [28, Proof of Proposition 4.6] by substituting the index sets 
X° and X^ by the modified index sets A^, X° and X=*= respectively. An inspection of 
the proof of Lemma 3.3 from [28] and Assumption 1 shows that the Rayleigh quotient of 
N = N(u) from (29) is bounded from below. □ 

In the following theorem, we present our main result on the global convergence of 
Algorithm modBSSN. 
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Theorem 12. Let u* G be an accumulation point of the sequence of iterates 
produced by Algorithm modBSSN. Then, we have 0(u*) = 0. 

Proof. We proceed analogously to the proof of [44, Theorem 1] and we also use the proof 
of [44, Proposition 1], We suppose > 0 for all j, because otherwise the claim is 

proven. Because of the Armijo rule (47), the sequence {©(u^'^))}^ strictly decreases and is 
bounded from below by 0, i.e. convergent. Let tj = be the computed Armijo stepsize 
in step j. From the Armijo rule (47), it follows 

0 < 2cjfj©(u^'^^) < ©(u^-^^) — ©(u^'^"’"^^) —)■ 0, j —)■ oo. 

Therefore, we have 

lim tjQ{u^^^) = 0. 

j^OD 

The level set L©(u(°)) = {u G M"" : ©(u) < ©(u^'^^)} is bounded by Assumption 1, 
implying that the sequence is bounded and has an accumulation point u*. Let 

be a subsequence converging to u*. If the stepsizes tj are bounded away from 
zero, i.e. we have limsupj_^(^jgjtj > 0, it directly follows ©(u*) = 0. 

Let us now consider the case limsupj_^ocjgjfj = 0. Without loss of generality, we 
suppose limj^oo,j£j tj = 0. By the Armijo rule (47), we have for all j £ J 

©(u(^')) - ©(u(^') + < 2u/l'^-^©(u(^)). (49) 

We define ;= The sequence of Newton directions is bounded 

because of Lemma 11, implying that u* is the limit of the subsequence There¬ 

fore, without loss of generality we have 

A+(u*) c A+(u(^’))nA+(u(^')), 
a:(u*) c A:(u(^’))nA:(u(^')), 
x;(W)cx;(u(^'))nx;(u(^')), 
x°(u*) cx°(u(^'))nx°(£i(^')), 

for all j £ J large enough. Now we consider 

8 

©(u(^'))-©(u(^')) = (50) 

i=l 

where 

Xi:= J] (Xfc(u(^))2-Xfc(u(^'))2), f2-.= Y. {FkiviY^ 

fcel4(u*) fcei°(u*) 

Xs- (Xfc(u(^))2-Xfc(u(^))2), r4:= (Xfc(u(^))2-Xfc(u(^))2), 

fcSl+(u*) fcSX“(u*) 

X5:= J] (Xfc(u(^'))2-Ffc(u(^'))2), X6:= (Xfc(u(^))2 - Xfc(u(^))2), 

fceTg(u*) fceTI(u*) 

fT.= Y -FkinY^), Xg- ^ (Xfc(u(^))2-Xfc(u(^))2). 

fcei7(u*) fcei°(u*) 

In the following, we estimate each sum from below. Finally, we prove the claim by using 
(49) and by taking the limit j ^ oo, j £ J. 
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li k e ^(u*), we have for j G J large enough k G k G or A: G 

Using (40), (41), (43) and (44), we obtain 

Ti = ^ ((7(V5'(u^-^)))fc ± 'jwkf - i'y(yg{u^^'>))k ± 'yu’kf) 

keA{u*) 

= ^ -2/3'^'“n7(V5f(u^^^))fc ±7^i’fc)7(V^5(u^^^)d^^^)fc 

fce,4(u*) 

+ o(||u^-^^ — II 2 ) 

> 2/3'^'“^ ^ Fkiu^^^f + o(/3'j“^||d(^)||2), j -> oo,j G J. 

keA{u*) 

Analogously it follows with (43) and (44) 

n>2/3^^-^ Y, Ffc(u(^’))2 + o(/3'^-i||dW||2), 

kGA^iu*) 


and 


T6>2/3'i-i J] Ffe(u(^'))2 + o(/3'i-i||d(^')||2), j^oo,jGJ. 

fce^I(u*) 


For k G /°(u*), we have to consider the cases k G X°(u(-^)), k G X^(u('^)) and k G X° (u^-^^). 
With (42) and (45), we have 


7 = E ((““t-( 4 '’t) = E 




fcGX'^(u*) 


/cGX°(u*) 


> 2 / 3 '^-^ ^ 


U 


( i )\2 






kGX°{u*) kGX°{u*) 

2/3b-i ^ Ffc(u(^’))2- ^ (/3'i-id 


7-u(i)'|2 
fc 1 • 


fcei°(u*) 

Accordingly, it follows with (45) 


fc£l°(u*) 


and 


TV >2/3'^-^ E E 

k£l‘^{u*) k£l^{u*) 

T 8 > 2 / 3 '^-^ E E 

fcei!.(u*) fcei4(u*) 


In the following, we treat the sum T3. For k G X'*'(u*), we may assume without loss of 
generality 

k G (x+(u(^')) UX°(u(^')) U^+(u(^'))) n (x+(£i(^')) UX°(u(^')) U^+(u(^’))). 

We split X’*"(u*) = 5i(u*) n 52(u*) n 53(u*), where 


5i(u*) := {k:ul = 7 (V 5 ((u*))fc + jWk > 0}, 
52(u*) := {k:ul = j{Vg{u*))k + jWk = 0}, 
*S'3(u*) := {k :ul = j{Vg{u*))k + jWk < 0}. 
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For k G 5i(u*), we may assume with (16) 

+ ywk} > 0, 

= mm{u^^\y{Vg{u^^^))k + ywk} > 0. 

Therefore, we have 

Ffc(u(^))2 - Fk{u^^'>f = min{(M^-^^)2, (7(Vc/(u(^)))fc + ywkf} 

- min{(4‘^^)^, (7(V(/(u(^)))fc + ywk)'^}. 

In the case k G X°(u('^)), we have k G X°(u('^)) or /c G X° because Fk{vS^'^) > 0. With 

(42) and (45), we have 

Ffc(u(^'))2 - Ffc(u(^'))2 > (u4)2 _ ^ - {15^^-^dff 

For k G it follows k G A+(ui-^i) because Ffc(uit)) > 0. Hence, one has with (40) 

Fkin^^'^f - Fk{h^^'> f 

> (7(Vc/(ui^i))fc + 7u;fc)2 - (7(Vc/(ui^i))fc + 7u;fc)2 
= - 2/3^J“^(7(Vc/(u(^i))fc + 7u;fc)7(V^c/(u(^i)di^i)fe + o(||/3^J“Mi-^iII 2 ) 

= 2/3'^-iFfc(u(^'))2 + o(/3'^-i||d(^')||2), j ^ 00, j G J. 

If /c G X'^(ui'^i), we have Ffc(uit)) = = 7 (V 5 '(ui-^i))fc + yivk and we have either 

7 (V^ 5 f(uit))diJi)fc = —( 7 (V( 7 (ui-^i))fc + 7 rcfc), see (28) and (38). As in the 
cases k G X°(uit)) and k G A"'"(uit)), conclude 

Ffc(u(^’))2 - Ffc(u(^’))2 > 2/l'^-iFfc(u(^'))2 + o(/3^^-i||dW||2), j ^ oo,i G J. 

Altogether, we get 

(Ffc(u(^'))2-Ffe(u(^))2) 

fceSi(u*) 

>2/3h-i ^ Ffc(u(^'))2^o(/3'i-id(^')||2), j^ooJeJ. 

k£Si (u*) 

For k G 5*2 (u*), we have with Lipschitz-constant L of Fk 

Ffc(u(^'))2 _ i7^(u(i))2 ^ (Ffc(u(^')) - Ffc(u(^'))) (Ffc(u(^')) + Ffc(u(^'))) 

< L||ui-^i — ui'^'i||2|Ffc(ui-^i) + Ffc(ui-^i)| 

= L/l'^-i||d(^')||2|Ffc(u(j')) + Ffc(u(^4|. 


lim 

j^oo,j£j 


E 

fceS2(u*) 


Ffc(u(^'))^-Ffc(u(t))2 

/3b -1 


= 0 . 


It follows 
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Let now k G S' 3 (u*). We may assume < 0, < 0, + yiffc < 0 and 

7 (V 5 ((u('^)))fc + ^Wk < 0. With (16), one has 

=max{( 4 ^^)^ (7(V5r(u(^)))fc+ 7 u;fc)^} 

- max{(4^^)^ (7(V5(u(^)))fc + 7 ^^)^}. 

First, we treat the case < {'y{Vg{u^^^))k + 'yw^)'^. We have to consider the cases 

k G k G and /c G {A: G ; Fk{u^^^) < 0)}. With (43), we have 

Ffc ( u (^))2 - Ffc ( u (^))2 

> {j{Vg{u^^'>))k + JWkf - (-/(Vg(u^^'>))k + JWkf 

= - 2/3^J"^(7(V5r(u(^^))fc + 7r(;fc)7(V^5r(u(^^)d(^))fc + o(^'^"^||d(^)|| 2 ) 

> 2/3^J“^(7(Vc/(u(^)))fc + 7 rcfc)^ + o(/3^^“^||d(-^)||2), j-> 00 , j G J. 

Second, we consider the case (tt^ ^)^ > {'-y{Vg{u^^'^))k +^Wk)‘^. With (45), we have analo¬ 
gously 

Fk{u^^')f - Ffc(u (^'))2 > - (4"''^ = -2/3^^-i4^^4‘'^ - (/3'^“^44 

Altogether, we obtain 

(Ffc(u(^)) 2 -Ffc(£i(^)) 2 ) 

fceSalu*) 

>2/3'^-i min{(4^'4,(7(V<7(u(^)))fc + 7^fc)2} 

fceSalu*) 

+ o(/3^^"^||d(^)||2), j ->oo,j G J. 


By symmetry, we can treat the sum T/^ similarly. For j —)■ oo,j G J, we get 

J] (Ffc(u(^'))2 - Ffc(u(^'))2) 

{fceX-(u*):n*7^0} 

> 2/3b-i ^ min{(4^4>(7(V5f(u^^^))fc-7w^fc)^} 

{fceX~(u*):n*^ 0 } 

+ o(/3'^-i||d(^')||2), 

and 

lim V 

j^oo,j&J ^ db-l 

{fcel+(u*):«*= 0 } ^ 

Finally, we divide both sides of the inequality (49) by and take the limit j 
j G J, obtaining with (50) and the previous estimates 

20 (u*) < 2 cj 0 (u*). 

Here, we use the fact that the sequence is bounded, implying 


j->oo,jeJ 13' 


7-1 


j^ooj&j (3h i||dL)||2 


The choice o" < 1/2 implies 0(u*) = 0, finishing the proof. 


00, 


□ 
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As a consequence of the last theorem, we can argue that the stepsizes in Algorithm 
modBSSN are eventually chosen equal to 1. In the following theorem, we additionally assume 
that g is more regular and that F is smooth at the unique zero u*, i.e. X+(u*)UX~(u*) = 0. 

Theorem 13. Let g be three times continuously differentiable. Let be a sequence 

produced by Algorithm modBSSN converging to a limit point u* with UX~(u*) = 0. 

Then, there exists an index jo G N such that tj = 1 for all j > jo- 

Proof. We proceed as in the proof of [44, Theorem 2], Inspired by loc. cit., we show that 
for all j large enough, we have 

> 2(T0(u('^^). (51) 

We show the claim by contradiction. Let the subsequence fulfill 

0(u^-^^) — 0(u^-^^ + < 2cj0(u^-^)) (52) 

for all j £ J large enough. Because of Lemma 11, we have ||d ('^^||2 < C'||F(u ('^^)||2 with a 
constant C > 0. Therefore, with := the sequence has the limit 

u*. We consider 

2 

0(uW)-0(u(^')) = ^f;, (53) 

i=l 

where 

Ti:= (Ffc(u(^'))2-Ffc(u(^'))2), 

fceT(u*) 

f2:= 

kex°{u*) 

Because of Theorem 12, we have 

A+(u*) = {k :0 = y{Vg{u*))k + ywk < uf}, 

"^~(u*) = {/c : 0 = 7 (Vc/(u*))fc - ywk > uf}, 

I°{u*) = {k : j{Vg{u*))k - ywu < = 0 < 7 (Vg'(u*))fc + ywk}. 

For all j G J large enough, we have 

A+{u*) c :4+(u(^’)) n:4+(u(^')), 

A-{u*) c :4^(u(^’)) n:4^(u(^')), 
x°(u*) c x^(u(^’)) nx°(u(^')). 

Lemma 11 implies the boundedness of the subsequence {d(-^)/||F(u(-^)) || 2 }jej of quotients 
and without loss of generality, this subsequence has a limit d G ML and the subsequence 
{F(uO-))/||F(uO'))||2},6j of unit vectors tends to a unit vector F G M”. 

Similar to the proof of Theorem 12, we estimate the sums Ti and T 2 . First, we treat the 
sumTi. Because k G A(u*) C A(u’^'^^)nA(u’^'^)), we have Ffc(u’^'^^)+ 7 (V^ 5 ((u('^))d(-^))fc = 0. 
Dividing by ||F(u ('^))||2 and taking the limit j —)■ 00 , j G J, it follows 

Ffc + 7(VVu*)d)fc = 0. 
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There exists a vector v on the line segment between and with 


Ti= Y1 

k&A(u*) 

= ( - 2Tfc(u(^))(7V^5r(u(-^))d(^))fc - (7V^5r(v)d(-^))| 


A:eT(u*) 


k ^ "f ^ QuidUmdUk ‘ 


Lra=l 


Y. (2ft(u“f - (7V=9(v)dO))| - ft(v)7 E Dujlldu/ ^ 


duidumduk 


A:S,4(u*) l,m=l 

Dividing by ||F(u('^^)II 2 and taking the limit j —)• 00 , j G J, it follows 

Ti 




j^ 'co,j&J ||F(u(-?))||2 


E H 

k£A{u*) 


Now we consider the sum T 2 . We have k G X°(u*) C nX°(u(-^)) and 


T2= Y. {Fk{u^^^f-Fk{n(Ff)= Y ii^k^f-i^k'^f) 


kel°{u*) 


A:Sl°(u*) 


fcel°(u*) 

Therefore, we have 


lim 

^oo,jg J 


T 2 


|F(uO'))||2 


E 

fcei°(u*) 


Finally, we divide both sides of the inequality (52) by ||F(u('^^)||| and take the limit 
j —)• 00 , j G J, obtaining 


F|li<2a||F||i 


which is a contradiction to ||F ||2 = 1 and the choice a < 1/2 in the Armijo rule (47), 
finishing the proof. □ 

Now we consider the locally quadratic convergence of Algorithm modBSSN in the case 
that the stepsizes tj are eventually chosen equal to 1, i.e. according to Theorem 13 espe¬ 
cially in the case X’^(u*) UX~(u*) = 0. In the following theorem, we need the bounded in- 
vertibility of G(u) from (32) in a neighborhood of the zero u* of F. Because M := V^(/(u) 
is symmetric and positive definite, the inverse of G at u is bounded by a constant C > 0 

||G(u)-i2 < l|Mg;g||2(i + IlMe^clb) + 1 < l|M-i2(^ + IIMII 2 ) + 1 < G, (54) 

see [26, Proposition 3.11] and [40, Lemma 3.6]. The boundedness follows from Assumption 
1. For the following theorem, we need again the additional assumption that g is three times 
continuously differentiable. 
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Theorem 14. Let g be three times continuously differentiable and let the stepsizes tj be 
chosen equal to 1 for all j large enough. Let be a sequence produced by Algorithm 

modBSSN converging to u*. Then, there exists a constant C > 0 so that locally quadratic 
convergence is achieved, i.e. for all j large enough, we have 

||u(i+i) _u*||2 < C||u(^') -u*||i. 

Proof We follow the proof of [44, Theorem 3]. By assumption, we have tj = 1, i.e. 
u(i+i) = u(i) _|_ for all j large enough. With from (31), we have 

= 0, for k G 

= 0, for k G 

Because u* is the limit of we have for j large enough 

A(u*) C ({A: : > y{Vg{u^^'>))k + ywk A > 0} 

U {k : < 7(Vc/(u(^)))fc - ywk A < 0}) 

This yields the inclusion C(u('^)) C X’^(u*) UX“(u*) UX°(u*), implying = 0. 

Analogously, we have for j large enough 


X°(W) 

C {k : - 7(V5((u(-^')))fc| < ywk A 'y(yg{u^^'>))k + ywk > 0} 

U {k : - 7(V5r(u(-^'^))fc| < 7u;fc A 7(V5r(u(-^')))fc - ywk < 0} 


Consequently, we have C X+(u*) UX“(u*) U A(u*), implying 0 = X).(u*) = 

7 V 5 '(u*)fc ±-jWk, respectively, for all k G 

Skipping the arguments B = C = C{u^^'>), we obtain with = 0, F(u*)g = 0 

and the mean value theorem 


/(G(u0))(u0+i)-W))^\ 

V(G(u0))(u0+i)-W))J 

( (V25(u(^)))b,c\ f - u*)s\ 

V 0c,B Ic,c J + u(A) -u*)cj 

/—F(u('^^)g + 'yV‘^g{u^^'^)Q^t3{u^^'^ — u*)g + — u*)c 

+ (u(A) - W)c 

•(•'A) 


^ d^g{v) / * * 0 )^^ 

E ^ du/u ju, (^l - )(^m - <’)) 

l,m=l ' 


k£B 


Oc 


where v is a vector on the line segment between u^A) ^nd u*. For j large enough, the 
matrix G(u('^^) is boundedly invertible by Assumption 1, cf. (54). Therefore, there exists 
a constant C > 0, depending only on u*, with 

||u(i+i) _u*||2 < C||u(^') -u*||i, 


for all j large enough, proving the claim. 


□ 
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Note that in case of a quadratic functional ^(u) = ^IlKu—f||| with K injective, G(u)“^ 
was shown to be uniformly bounded in a neighborhood of the zero u* of F [26]. Hence, 
in case of a quadratic functional g with X'’'(u*) UX“(u*) = 0, the stepsizes in Algorithm 
modBSSN are eventually chosen equal to 1, locally quadratic convergence is achieved and 
u* is found within finitely many steps, see also Remark 10 and [28]. For other functionals 
g, these conditions need to be verified. 

3.2 Convergence of the B-semismooth Newton method 

In this section, we consider Algorithm BSSN, i.e. the B-semismooth Newton method from 
[28] generalized to the minimization problem (10), see Section 2.2. We cite the convergence 
theorem from [28, Theorem 4.8], see also [27, Theorem 1]. 

Theorem 15. Let Assumption 1 be fulfilled and let be a sequence of iterates 

produced by Algorithm BSSN from Section 2.2. Let {tj}j be the chosen stepsizes. 

(i) If liuisupj^^tj > 0, then — )■ u*, j —)■ oo with 0(u*) = 0. 

(a) // lim supj^oo = 0 and if u* is an accumulation point of where condition 

(9) holds at u*, then —)■ u*, j —)■ oo with 0(u*) = 0. 

Proof. The proof follows [28, Proof of Theorem 4.8] using Assumption 1, Lemma 5, Lemma 
8 and Lemma 11. □ 

Analogously to [28, Corollary 4.10], we can deduce from [45, Theorem 4.3, Corollary 
4.4] that if the zero u* of F is an accumulation point of a sequence of iterates 

produced by Algorithm BSSN, the sequence converges locally superlinearly to u* 

and the stepsizes tk are eventually chosen equal to 1. Nevertheless, the modification of the 
index sets is essential for the modified B-semismooth Newton method (Algorithm modBSSN) 
to overcome the theoretical drawback of the technical assumption (9) in Theorem 15, see 
Section 3.1. 

3.3 Convergence of the hybrid method 

The global convergence and the local convergence speed of Algorithm hybridBSSN from 
Section 2.3 directly follow from Theorem 12 and Theorem 14 resp. Section 3.2. The 
method combines the efficiency of Algorithm BSSN and the stronger convergence properties 
of Algorithm modBSSN. 

4 Numerical results 

In this section, we present numerical experiments demonstrating our theoretical results. 
We first consider image deblurring for gray-scale images degraded by motion blur. This 
is a linear inverse problem and in the presence of noisy measurement data regularization 
is essential. Assuming that the image is sparse, i.e. it has only few nonzero pixels, we 
apply £i-penalized Tikhonov regularization, compare (6). Here, Assumption 1 is fulfilled. 
Second, we consider a nonquadratic functional g arising in robust linear regression. If data 
is degraded by outliers, instead of minimizing the ordinary least squares functional one 
may choose a more robust objective function, see e.g. [2,10,23]. Giving preference to simple 
models, we add a sparsity promoting penalty term as proposed in current research effecting 
that irrelevant coefficients are set equal to zero, see e.g. [1,37,51] and the references therein. 
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For the arising minimization problem (10), it is not ensured that all prior assumptions are 
fulfilled. Nevertheless, convincing numerical results are achieved. 

For our numerical experiments, we use MATLAB® 2015a and the computations are 
run on a desktop PC with Intel® Xeon® CPU (W3530, 2.80 GHz). In Algorithm modBSSN, 
Algorithm BSSN and Algorithm hybridBSSN, see Algorithm 1, we choose the Armijo pa¬ 
rameters a = 0.01 and /3 = 0.5. The stopping criterion is a residual norm ||F(u ('^))||2 
smaller than 10“'^ in all computations. If not otherwise stated, the zero vector is chosen 
as starting vector. In Algorithm hybridSSN, we choose jmax = 250 and tmin = 10“®. 

The performance of Algorithm modBSSN, Algorithm BSSN and Algorithm hybridBSSN 
depends on the choice of the parameter 7 as well as, at least concerning Algorithm modBSSN, 
the particular solver for the linear complementarity problem (28). In our numerical exper¬ 
iments, the linear complementarity problem is solved with the modihed damped Newton 
method from [30]. This algorithm is a specialization of the method from [43] to linear 
complementarity problems. It was shown in [22] that the method finds the true solu¬ 
tion to the linear complementarity problem within finitely many iterations. The stopping 
criterion for an iterate x is here chosen as jjminjx, z -|- Nx}jj 2 < 10“^. If the starting 
vector x^*^) G fulfills (z -|- Nx^^))^ 7^ x® for all k where N, z from (29) resp. (30), 

which is the case if e.g. x^^^ := 0 and if 7 ^ 0 for all k, the Newton method only poses 
one linear system per iteration [30]. We choose x*^^) ;= 0. If this condition is violated 
by the starting vector or if more than 50 Newton steps are needed, we switch to an im¬ 
plementation^ of Lemke’s algorithm [12,53]. The damped Newton method from [30] is 
often faster than Lemke’s method in terms of computational time, see also the numerical 
results in [30]. We also tested an interior point method using the MATLAB function 
quadprog and an implementation^ of the semismooth Newton-type method [21] based on 
a Fischer-Burmeister reformulation of the linear complementarity problem as well as the 
PATH solver^ from [14,19]. We decided to solve the linear complementarity problem up to 
machine precision because its inexact solution may cause an increased number of Newton 
steps. The arising systems of linear equations are solved with a direct solver (MATLAB 
backslash subroutine). 


4.1 Image deblurring 


We consider the deblurring of images which are degraded by horizontal motion blur caused 
by either motion of the camera or the photographed object while taking a photo. Here, 
we proceed as in [29]. Our aim is the reconstruction of the original square image u from 
noisy measurements of the blurred image f. As proposed in [29], we consider the discrete 
problem Ku = f , where u, f G and the Toeplitz matrix 


K = 


2[A^LJ -h 1 


/l • •• 1 0 • • • o\ 

1 ■•. ■•. 0 
0 1 

yo • •• 0 1 • • • ij 


I G 




( 55 ) 


^The code is taken from http://code.google.eom/p/rpi-matlab-simulator/source/browse/ 
simulator/engine/solvers/Lemke/lemke.m (30 June 2015). 

^The code is taken from http://www.mathworks.com/matlabcentral/fileexchange/ 

20952-lcp-mcp-solver—newton-based-/content/LCP.m (30 June 2015). 

®The code is taken from http://pages.cs.wisc.edu/~ferris/path.html (08 February 2016). 
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Figure 1: Reconstruction of a blurred test image with = 128^ pixels containing 5% of 
noise using Algorithm modBSSN with regularization parameter w = 0.9^^ ~ 0.0309, 7 = 10^ 
and blurring parameter L = 0.1. From left to right: original image, blurred noisy image, 
reconstruction. 


Table 1: Performance of Algorithm modBSSN depending on the choice of 7 . The recon¬ 
structions are computed for the image from Figure 1 with 128^ = 16384 pixels containing 
5% of noise, regularization parameter w = 0.9^^ ~ 0.0309 and blurring parameter L = 0.1. 


7 

number of steps 

#{i : tj = 1} 

10^ 

113 

3 

10^ 

23 

5 

10® 

12 

9 

10" 

12 

10 

10® 

11 

8 

10® 

11 

8 

lO’^ 

13 

7 


where the matrix on the left-hand side of the Kronecker product has bandwidth 2 [NL\ +1 
and where I G denotes the identity matrix. The blurring parameter L characterizes 

the motion blurring of the image and we choose L = 0.1. To avoid inverse crime, we 
discretize the problem with the Simpson rule to compute the blurred image f and use 
the discretization (55) to solve the inverse problem. The noise is computed with the 
MATLAB function randn and the noisy blurred image contains 5% relative noise, i.e. 
we have ||f — f ‘^||2 = 5 %||f|| 2 . 

The regularization parameters Wk = w, k = are chosen equal and w is 

computed by the discrepancy principle, see e.g. [3,5,16,49]. More precisely, we choose 
w = 0.9^°, q = 0.9 and r = 2 and set w := wq until the inequality ||Ku^ — f ^||2 < 
T||f — f ^||2 is fulfilled, where u^, denotes the solution to (10) with = w for all k, n = N"^ 
and g{u) = ^||Ku — F^H^- For each computation of u^t,, we choose the minimizer 
of the Tikhonov functional with w = w/0.9 as starting vector. In this subsection, we 
mainly consider Algorithm modBSSN because the performance of BSSN from Section 2.2 for 
quadratic functionals g was discussed in [28]. 

We consider an artificially created sparse image with about 15% nonzero entries, the 
sparseness depends on the number N'^ of pixels, see Figure 1. Here, the original image of 
the size 128 x 128 pixels, the blurred image containing 5% of noise and the reconstruction 
are presented. The blurring parameter is chosen as L = 0.1, the regularization parameter 
is chosen as w = 0.9^^ ~ 0.0309 and the parameter 7 in Algorithm modBSSN is set equal 
to 10 ^. 

Table 1 demonstrates the performance of Algorithm modBSSN for the image from Figure 
1 depending on the choice of 7 > 0. The parameter 7 should not be chosen too small. 
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Table 2: History of the residual norms, the Tikhonov functional values, the stepsizes, 
the system sizes of the linear complementarity problems (LCP) and of the systems of 
linear equations (SLE) and the number of linear systems for the image from Figure 1 with 
128^ = 16384 pixels containing 5% of noise, reconstructed with Algorithm modBSSM with 
regularization parameter w = 0.9^^ ~ 0.0309, the parameter 7 = 10^ and L = 0.1. 


j 

|lF(ub ))||2 

J(ubl) 

^3 

size of LCP 

size of SLE 

# SLE 

0 

2.3200e+06 

357.1522 

- 

- 

- 

- 

1 

8.8461e+02 

105.8198 

1 

0 

6043 

1 

2 

7.9131e+02 

76.3797 

1 

616 

3944 

12441 

3 

5.7716e+02 

66.8937 

0.5 

441 

3161 

13224 

4 

4.2644e+02 

59.0065 

0.5 

321 

2769 

13616 

5 

2.3887e+02 

51.8326 

1 

220 

2574 

13811 

6 

2.0991e+02 

49.7913 

0.5 

90 

2452 

13933 

7 

1.8523e+02 

47.4883 

1 

67 

2376 

14009 

8 

4.8111e+01 

46.7994 

1 

22 

2357 

14028 

9 

4.1728e-01 

46.4408 

1 

13 

2330 

14055 

10 

2.4710e-01 

46.4212 

1 

0 

2326 

1 

11 

4.5635e-10 

46.4061 

1 

0 

2325 

1 


because the number of Newton steps increases and the amount of steps with stepsize t^ = 1 
decreases for smaller 7. For 7 = 10^, the stepsizes are chosen equal to 1 in 10 out of 12 
steps. 

The strict decrease of the residual norm in Algorithm modBSSN for the image from 
Figure 1 with 128^ = 16384 pixels is demonstrated in Table 2. Here, the Tikhonov 
functional values 

;= ^— ^^112 + 111, j = 0 ,1 , ... 

are strictly decreasing as well, but this is not guaranteed in general. The stepsizes are 
eventually chosen equal to 1 ensuring the locally quadratic convergence of Algorithm 
modBSSN. The sizes of the linear complementarity problem (LCP), see (28), and of the 
systems of linear equations (SLE), solved in each step of Algorithm modBSSN to compute 
the matrix N from (29), the vector z from (30) and the Newton direction (39), are usually 
decreasing in the course of the iteration. Regarding the number 128^ = 16384 of pixels, 
these systems are small. This is due to the structure of Algorithm modBSSN. Because of the 
starting vector = 0, the set X=*=(u*^^^) is usually empty so that there is usually no LCP 
to solve in the first step. For other starting vectors the size of the LCP in the hrst 
step may be larger. If a linear complementarity problem is set up in step j, additionally 
|X(u('^^)| linear systems with the same matrix have to be solved, cf. Section 2.3. 

In Table 3, hve algorithms for the deblurring of the noisy image from Figure 1 are 
compared: Algorithm modBSSN and Algorithm hybridBSSN with the choice 7 = 10^, the 
globalized semismooth Newton method (BSSN) from [28] with the choice 7 = 10^, sparse 
reconstruction by separable approximation^ (SpaRSA) from [54] and Barzilai-Borwein gra¬ 
dient projection for sparse reconstruction^ (GPSR_BB) from [20]. Note that runtime is 
implementation-dependent. Note also that BSSN differs from modBSSN only in the choice 
of the index sets (11)-(15) resp. the modified index sets (23)-(27). By the modification of 
the index sets, the theoretical drawback that BSSN may fail to converge was eliminated. 

^The implementations of SpaRSA, and GPSR_BB are taken from http://www.lx.it.pt/~mtf/SpciRSA/ 
and http://www.lx.it.pt/~mtf/GPSR/ respectively (30 June 2015). 
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Table 3: Comparison of different algorithms for the deblurring of the image from Figure 
1 with N = 128^ pixels, 5% of noise, blurring parameter L = 0.1 and regularization 
parameter w = 0.9^^ ~ 0.0309. The starting vector = 0 is chosen for all algorithms. 


algorithm 

average runtime(s) 

Jend 

T T* 

^ end •J 

# iterations 

# zeros 

modBSSN 

8.54 

46.4061 

1.4211e-14 

11 

14059 

BSSN 

0.32 

46.4061 

0 

13 

14059 

hybridBSSN 

0.32 

46.4061 

0 

13 

14059 

SpaRSA 

1.96 

46.4061 

9.6221e-08 

1057 

14057 

GPSR_BB 

15.21 

46.4061 

9.9948e-08 

6352 

14059 



0 0.2 0.4 

runtime(s) 


-BSSN 
•■modBSSN 
hybridBSSN 
- SpaRSA 
■GPSR^BB 




runtime(s) 


runtime(s) 


Figure 2: Runtime history of the difference of the Tikhonov functional values and J* for 
different noise levels 5. From left to right: 6 = 10%, 5%, 1%. The gray line marks the 
target 25^ in each case. 


Therefore, one has to solve mixed linear complementarity problems instead of solving only 
systems of linear equations. In practice, applying BSSN, complementarity problems usually 
do not appear. The stopping criterion of Algorithm modBSSN, Algorithm hybridBSSN and 
Algorithm BSSN is a residual norm ||F(u('^))|| 2 < 10 The other three algorithms are 
terminated if the Tikhonov functional value falls below the threshold J* + 10“^, where 
J* denotes the Tikhonov functional value of hybridBSSN at convergence. The average 
runtime (clock time) of five runs with starting vector = 0, the Tikhonov functional 
value Jend termination, the difference of Jend to J*: the number of iterations and the 
number of zeros of the computed solution are listed for the different algorithms. All al¬ 
gorithms produce sparse solutions with 14059 resp. 14057 zero components, i.e. about 
14.2% nonzero entries. The semismooth Newton methods need only few iterations com¬ 
pared to the other methods. The fastest algorithms are BSSN and hybridBSSN followed 
by SpaRSA, modBSSN and GPSR_BB. The runtime of Algorithm modBSSN may be improved 
by using another solver for the linear complementarity problems. In Table 3 and in the 
following runtime measurements, the computation of the regularization parameter by the 
discrepancy principle is not included in the listed runtimes. The runtimes are measured 
with the MATLAB command tic toe. 

The runtime history of the difference — J* of the Tikhonov functional values 

of the algorithms considered in Table 3 to the Tikhonov functional value J* of 
Algorithm hybridBSSN at convergence is shown in Figure 2 for different noise levels 6 = 
10%, 5%, 1%. The parameter 7 in the algorithms BSSN, modBSSN and hybridBSSN is 
chosen equal to 10^ and we set L = 0.1 and N'^ = 128^. Depending on the noise level, 
it may be adequate to solve the minimization problem only up to an expected accuracy. 
£i-Tikhonov regularization with a posteriori parameter choice by the discrepancy principle 
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number of pixels 


number of pixels 


Figure 3: Average runtime and average cputime of 5 runs depending on the number 
N'^ = {32k)‘^, k = 3,... ,8 of pixels for images containing 5% of noise. 

has a linear convergence rate, see [25], i.e. jju^^— 5 II — C(5||f||, where c > 0 is a constant, 
denotes the true solution to Ku = f with unperturbed right-hand side f and ^ denotes 
the solution to ( 6 ) with perturbed data f^, regularization parameter w and noiselevel 6. 
Therefore, we decided to minimize the Tikhonov functional up to an accuracy of 25^. For 
high noise levels and N'^ = 128^, Algorithm SpaRSA outperforms BSSN and hybridBSSN 
because it reaches the target first. If the minimization problem is solved more accurately 
in case of smaller noise levels or if the number N'^ of pixels increases, BSSN and hybridBSSN 
are advantageous in terms of runtime in this example, cf. Figure 3. 

Figure 3 presents a clock time and a cputime comparison of the considered algorithms 
for increasing image sizes N‘^ = (32/c)^, fc = 3,..., 8 . The cputime is measured with the 
MATLAB subroutine cputime. Once again, the blurring parameter is L = 0.1 and the 
images contain 5% of noise. The starting vector is = 0 for all methods, the stopping 
criterion < J* -|-2<5^ for GPSR_BB and SpaRSA is chosen as in Figure 2 and we choose 

7 = 10^ and the stopping criterion ||F(u (-^))||2 < 10“^ for BSSN, hybridBSSN and modBSSN. 
Again, the average runtimes resp. cputimes of 5 runs are shown. Algorithms BSSN and 
hybridBSSN outperform the other algorithms regarding cputime in this example, followed 
by SpaRSA, GPSR_BB and modBSSN. However, SpaRSA and GPSR_BB are better parallelizable 
than the B-semismooth Newton methods. 

4.2 Robust regression 

Given data ai,..., G K” and y G M™', m > n, our aim is to fit a linear model Au = y 
with u G and A = (ai • • • am)"*" G to the given data. Errors in data collection may 

cause outliers, and robust M-estimators give less influence to outliers than the ordinary 
least squares approach [23]. Here, we choose the well-known L 1 -L 2 estimator, see e.g. [10]. 
For a parameter p > 0, the measure function Tfp: M —)• M]]", (Pp{x) := 2{-s/p~+~x^j2 — y/p) 
fulfills the conditions ^p{x) = (pp{—x) and ipp is strictly convex [2,10]. We choose p = 1 
and the discrepancy term g: M” —)> M, 

.. m ri , _ 

k=l k=l 

To additionally obtain a sparse regression model, we add an .^i-penalty term 

min g(u) -|- t(;||u||i, 
uSM" 


( 56 ) 
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Figure 4: Sparsity of the regression model depending on the choice of the regularization 
parameter w. Left: path of the computed regression coefficients. Right: number of 
nonzero regression coefficients (red dashed line: number of nonzero coefficients of the true 
regression model). 


cf. (10), where the parameter w > 0 acts as regularization parameter, see e.g. [1,37]. In 
the following, we assume that A = (ai • • is injective. Then, the Hessian 

V^g{u) is positive definite for all u G M"'. However, it is not ensured that the level sets of 
0 stay bounded. The data ai,..., a^ G M"' are chosen normally distributed with standard 
deviation 1 and mean 0. We compute y*^ = Au + e, where e ~ Af{0, 1) and for a portion 
of the entries of we choose e ~ AA(0, 50), i.e. we construct outliers. 

If the underlying model is unknown, there are several possibilities to select the regu¬ 
larization parameter w. For example, cross-validation may be used as proposed in [1,51]. 
Here, we assume that the true model is known. Similar to the parameter choice strategy 
proposed in [36], we choose the regularization parameter w so that ^{k : {uw)k / 0} is 
equal to the number of nonzero elements of the true solution and has minimal standard 
error 


a = 


\ 


m — n — 1 


E( 

k=l 




- yfc)^ 


(57) 


respectively maximal R^-value 


= l -- 


^lEiy-ykV 


k=l 


(58) 


where denotes the vector of computed regression coefficients for the regularization 
parameter w. Therefore, we minimize (56) for w = zz/10000, i' = 1,..., 9000 and choose 
the starting vector for v > 1 as the solution to (56) of the last computation with 
w = {v — 1)/10000. The true model is of the size m = 10000, n = 100 and has 8 nonzero 
coefficients with weights —33, —7, —0.1, 1, 2, 13, 20 and 50. The noisy vector y*^ contains 
10% outliers. Figure 4 demonstrates the influence of the regularization parameter w > 0 
on the sparsity of the regression model. For the computations, we set 7 = 10 and the 
tolerance equal to 10“^ in Algorithm modBSSN. For very small w, all coefficients are chosen 
nonzero. If w is chosen larger than 0.8474, all coefficients are chosen equal to zero. 

Figure 5 shows the convergence properties of Algorithm modBSSN and the B-semismooth 
Newton method (BSSN) from Section 2.2 for the example from Figure 4. We choose 
w = 0.0201 and 7 = 10. Both algorithms converge within 6 steps and the chosen stepsizes 
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Figure 5: Illustration of the performance of Algorithm modBSSN and Algorithm BSSN with 
w = 0.0201 and 7 = 10 for the robust regression example. From left to right: true 
regression coefficients, residual norms, chosen stepsizes (identical for both algorithms). 


of the two algorithms coincide in this example. The stepsizes are four times chosen equal 
to 1. For other values of 7 , more Newton steps need to be computed. 

5 Conclusion 

In the present paper, we are concerned with the efficient minimization of functionals of the 
type (1). In [28], a globalized B-semismooth Newton method was presented for quadratic 
discrepancy terms. Here, we generalized the method from [28] to nonquadratic discrep¬ 
ancy terms. Additionally, by modifying index subsets, a modified algorithm was shown 
to be globally convergent without any additional requirements on the a priori unknown 
accumulation point of the sequence of iterates. Thus, we have overcome a theoretical draw¬ 
back of [28] concerning global convergence. Another advantage of the presented modihed 
method is its local convergence speed. If an additional assumption is fulfilled, we have 
shown that the stepsizes are chosen eventually equal to 1 and locally quadratic convergence 
is achieved. 

By design, the proposed modified B-semismooth Newton method requires the solution 
of one linear complementarity problem per iteration, instead of one linear system as in 
other generalized Newton schemes. However, we have demonstrated that these systems 
stay small relative to the number of unknowns and therefore do not spoil the overall 
complexity. A hybrid version combines the efficiency of the B-semismooth Newton method 
and the convergence properties of the modified method. 

In further research, one may focus on the development of globally convergent inexact 
Newton methods as proposed in [15] for the smooth case enabling the design of matrix- 
free variants. Moreover, the globalization of quasi-Newton methods like [40] could be 
considered. 
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